-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parent dimensions into calculation component table #2753
Conversation
… in parent dimensions
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## explode_ferc1 #2753 +/- ##
=============================================
Coverage 88.4% 88.4%
=============================================
Files 88 89 +1
Lines 10675 10711 +36
=============================================
+ Hits 9440 9476 +36
Misses 1235 1235
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can collaborate on some docstrings on the phone? You explain and I try to translate?
src/pudl/transform/ferc1.py
Outdated
Then, we assume that every parent factoid should have the same dimensions as its | ||
calculation component dimensions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean that every parent factoid should have the same dimension values as its calculation components? E.g. if a parent has utility_type=="electric"
then all of its components are also electric? Not that if a parent has a utility_type
dimension then the child should also have a utility_type
dimension.
Aren't parent nodes with the value total
in any dimension an exception to this?
Maybe we mean "With the exception of parents having a value of total
in some dimension, we assume that all parents should have the same dimension values as the calculation components that they are composed of."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a description of what we expect to be true of the dataframe that's reported at the end of this process? Or maybe you can explain it to me on the phone and I could write it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i added some context of what these tables looked like coming into this function as well as what actually happens/what is expected on the way out. very happy to chat on the phone about it. i also recommend going to looking at the unit test.
src/pudl/transform/ferc1.py
Outdated
calc_comp_idx = [ | ||
"table_name_parent", | ||
"xbrl_factoid_parent", | ||
"table_name", | ||
"xbrl_factoid", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the PKs necessarily not include the dimension columns? E.g. couldn't there be cases where we define different calculations depending utility_type
? This almost happens where we have one calculation with lots of details from the plant_in_service_ferc1
table, and a different simpler calculation for the gas and other utility types. In that case I think it happens to be the case that the calculation components are being reported in different tables, but does that necessarily have to be true?
Or is it the case that at this point in the processing of the calculation components table we're still adding values to the dimension columns, and at the end of this step they're going to be part of the PKs in a way that they weren't before? That seems true with respect to the addition of the total
calculations, but is it true more generally?
PR Overview
This PR does three main things:
_parent
suffix for the parent fact/table columns instead of_calc
suffix for all of the calculation componentsI recommend starting from the bottom of the files changes: test -> transform -> output.
PR Checklist
dev
).